In this article, we demonstrate implementing the Tensorflow Linear classifier model by an example. The details regarding this dataset can be found in the Diagnostic Wisconsin Breast Cancer Database.

In [1]:
import numpy as np
import pandas as pd
from sklearn import datasets

data = datasets.load_breast_cancer()
Data = pd.DataFrame(data['data'], columns = [x.title() for x in data['feature_names']])
Labels_dict = dict(zip(list(np.sort(np.unique(data['target'].tolist()))),
                       list([x.title() for x in data['target_names']])))
Target = 'Diagnosis'
Data[Target] = data['target']
# Data['Diagnosis'] = data['target'].replace(Labels_dict)
display(Data)
print(data['DESCR'])
Mean Radius Mean Texture Mean Perimeter Mean Area Mean Smoothness Mean Compactness Mean Concavity Mean Concave Points Mean Symmetry Mean Fractal Dimension ... Worst Texture Worst Perimeter Worst Area Worst Smoothness Worst Compactness Worst Concavity Worst Concave Points Worst Symmetry Worst Fractal Dimension Diagnosis
0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.30010 0.14710 0.2419 0.07871 ... 17.33 184.60 2019.0 0.16220 0.66560 0.7119 0.2654 0.4601 0.11890 0
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.08690 0.07017 0.1812 0.05667 ... 23.41 158.80 1956.0 0.12380 0.18660 0.2416 0.1860 0.2750 0.08902 0
2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.19740 0.12790 0.2069 0.05999 ... 25.53 152.50 1709.0 0.14440 0.42450 0.4504 0.2430 0.3613 0.08758 0
3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.24140 0.10520 0.2597 0.09744 ... 26.50 98.87 567.7 0.20980 0.86630 0.6869 0.2575 0.6638 0.17300 0
4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.19800 0.10430 0.1809 0.05883 ... 16.67 152.20 1575.0 0.13740 0.20500 0.4000 0.1625 0.2364 0.07678 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
564 21.56 22.39 142.00 1479.0 0.11100 0.11590 0.24390 0.13890 0.1726 0.05623 ... 26.40 166.10 2027.0 0.14100 0.21130 0.4107 0.2216 0.2060 0.07115 0
565 20.13 28.25 131.20 1261.0 0.09780 0.10340 0.14400 0.09791 0.1752 0.05533 ... 38.25 155.00 1731.0 0.11660 0.19220 0.3215 0.1628 0.2572 0.06637 0
566 16.60 28.08 108.30 858.1 0.08455 0.10230 0.09251 0.05302 0.1590 0.05648 ... 34.12 126.70 1124.0 0.11390 0.30940 0.3403 0.1418 0.2218 0.07820 0
567 20.60 29.33 140.10 1265.0 0.11780 0.27700 0.35140 0.15200 0.2397 0.07016 ... 39.42 184.60 1821.0 0.16500 0.86810 0.9387 0.2650 0.4087 0.12400 0
568 7.76 24.54 47.92 181.0 0.05263 0.04362 0.00000 0.00000 0.1587 0.05884 ... 30.37 59.16 268.6 0.08996 0.06444 0.0000 0.0000 0.2871 0.07039 1

569 rows × 31 columns

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        worst/largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 0 is Mean Radius, field
        10 is Radius SE, field 20 is Worst Radius.

        - class:
                - WDBC-Malignant
                - WDBC-Benign

    :Summary Statistics:

    ===================================== ====== ======
                                           Min    Max
    ===================================== ====== ======
    radius (mean):                        6.981  28.11
    texture (mean):                       9.71   39.28
    perimeter (mean):                     43.79  188.5
    area (mean):                          143.5  2501.0
    smoothness (mean):                    0.053  0.163
    compactness (mean):                   0.019  0.345
    concavity (mean):                     0.0    0.427
    concave points (mean):                0.0    0.201
    symmetry (mean):                      0.106  0.304
    fractal dimension (mean):             0.05   0.097
    radius (standard error):              0.112  2.873
    texture (standard error):             0.36   4.885
    perimeter (standard error):           0.757  21.98
    area (standard error):                6.802  542.2
    smoothness (standard error):          0.002  0.031
    compactness (standard error):         0.002  0.135
    concavity (standard error):           0.0    0.396
    concave points (standard error):      0.0    0.053
    symmetry (standard error):            0.008  0.079
    fractal dimension (standard error):   0.001  0.03
    radius (worst):                       7.93   36.04
    texture (worst):                      12.02  49.54
    perimeter (worst):                    50.41  251.2
    area (worst):                         185.2  4254.0
    smoothness (worst):                   0.071  0.223
    compactness (worst):                  0.027  1.058
    concavity (worst):                    0.0    1.252
    concave points (worst):               0.0    0.291
    symmetry (worst):                     0.156  0.664
    fractal dimension (worst):            0.055  0.208
    ===================================== ====== ======

    :Missing Attribute Values: None

    :Class Distribution: 212 - Malignant, 357 - Benign

    :Creator:  Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian

    :Donor: Nick Street

    :Date: November, 1995

This is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets.
https://goo.gl/U2Uwz2

Features are computed from a digitized image of a fine needle
aspirate (FNA) of a breast mass.  They describe
characteristics of the cell nuclei present in the image.

Separating plane described above was obtained using
Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree
Construction Via Linear Programming." Proceedings of the 4th
Midwest Artificial Intelligence and Cognitive Science Society,
pp. 97-101, 1992], a classification method which uses linear
programming to construct a decision tree.  Relevant features
were selected using an exhaustive search in the space of 1-4
features and 1-3 separating planes.

The actual linear program used to obtain the separating plane
in the 3-dimensional space is that described in:
[K. P. Bennett and O. L. Mangasarian: "Robust Linear
Programming Discrimination of Two Linearly Inseparable Sets",
Optimization Methods and Software 1, 1992, 23-34].

This database is also available through the UW CS ftp server:

ftp ftp.cs.wisc.edu
cd math-prog/cpo-dataset/machine-learn/WDBC/

.. topic:: References

   - W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction 
     for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on 
     Electronic Imaging: Science and Technology, volume 1905, pages 861-870,
     San Jose, CA, 1993.
   - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and 
     prognosis via linear programming. Operations Research, 43(4), pages 570-577, 
     July-August 1995.
   - W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques
     to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) 
     163-171.

Features with high variance

Moreover, high variance for some features can hurt our modeling process. For this reason, we would like to standardize features by removing the mean and scaling to unit variance.

In [2]:
from sklearn import preprocessing

X = Data.drop(columns = [Target])
y = Data[Target]
Temp = X.var().to_frame(name= 'Variance (Origial)').round(4)
scaler = preprocessing.StandardScaler()
X_std = scaler.fit_transform(X)
X_std = pd.DataFrame(data = X_std, columns = X.columns)
Temp = Temp.join(X_std.var().to_frame(name= 'Variance (Normalized)').round(4))
display(Temp.style.background_gradient(cmap='Reds', subset = 'Variance (Origial)')\
        .background_gradient(cmap='Greens', subset = 'Variance (Normalized)'))
del Temp
  Variance (Origial) Variance (Normalized)
Mean Radius 12.418900 1.001800
Mean Texture 18.498900 1.001800
Mean Perimeter 590.440500 1.001800
Mean Area 123843.554300 1.001800
Mean Smoothness 0.000200 1.001800
Mean Compactness 0.002800 1.001800
Mean Concavity 0.006400 1.001800
Mean Concave Points 0.001500 1.001800
Mean Symmetry 0.000800 1.001800
Mean Fractal Dimension 0.000000 1.001800
Radius Error 0.076900 1.001800
Texture Error 0.304300 1.001800
Perimeter Error 4.087900 1.001800
Area Error 2069.431600 1.001800
Smoothness Error 0.000000 1.001800
Compactness Error 0.000300 1.001800
Concavity Error 0.000900 1.001800
Concave Points Error 0.000000 1.001800
Symmetry Error 0.000100 1.001800
Fractal Dimension Error 0.000000 1.001800
Worst Radius 23.360200 1.001800
Worst Texture 37.776500 1.001800
Worst Perimeter 1129.130800 1.001800
Worst Area 324167.385100 1.001800
Worst Smoothness 0.000500 1.001800
Worst Compactness 0.024800 1.001800
Worst Concavity 0.043500 1.001800
Worst Concave Points 0.004300 1.001800
Worst Symmetry 0.003800 1.001800
Worst Fractal Dimension 0.000300 1.001800

Train and Test sets

In [3]:
import plotly.express as px
from HD_DeepLearning import DatasetTargetDist

Pull = [0 for x in range((len(Labels_dict)-1))]
Pull.append(.05)
PD = dict(PieColors = ['SeaGreen','FireBrick'], TableColors = ['Navy','White'], hole = .4,
          column_widths=[0.6, 0.4], textfont = 14, height = 400, tablecolumnwidth = [0.25, 0.15, 0.15],
          pull = Pull, legend_title = Target, title_x = 0.5, title_y = .9, pie_legend = [0.01, 0.01])
del Pull
DatasetTargetDist(Data, Target, Labels_dict, PD, orientation= 'columns')

StratifiedKFold is a variation of k-fold which returns stratified folds: each set contains approximately the same percentage of samples of each target class as the complete set.

In [4]:
from sklearn.model_selection import StratifiedShuffleSplit

def HD_StratifiedShuffleSplit(X, y, Test_Size = 0.3):
    sss = StratifiedShuffleSplit(n_splits=1, test_size=Test_Size, random_state=42)
    _ = sss.get_n_splits(X, y)
    for train_index, test_index in sss.split(X, y):
        # X
        if isinstance(X, pd.DataFrame):
            X_train, X_test = X.loc[train_index], X.loc[test_index]
        else:
            X_train, X_test = X[train_index], X[test_index]
        # y    
        if isinstance(y, pd.Series):
            y_train, y_test = y[train_index], y[test_index]
        else:
            y_train, y_test = y[train_index], y[test_index]
    del sss
    return X_train, y_train, X_test, y_test

X_train, y_train, X_test, y_test = HD_StratifiedShuffleSplit(X_std, y)

from HD_DeepLearning import Train_Test_Dist  
PD.update(dict(column_widths=[0.3, 0.3, 0.3], tablecolumnwidth = [0.2, 0.4], height = 550, legend_title = Target))

Train_Test_Dist(X_train, y_train, X_test, y_test, PD, Labels_dict)
#
import tensorflow as tf
# y_train = tf.keras.utils.to_categorical(y_train, num_classes=len(Labels_dict))
# y_test = tf.keras.utils.to_categorical(y_test, num_classes=len(Labels_dict))

Feature Columns

Create the feature columns, using the original numeric columns as is and one-hot-encoding categorical variables.

In [5]:
def Feat_Columns(Inp, Numeric = False, disp_dtype = False):
    '''
    Feature Columns function
    Input: Dataset
    Output: Tensorflow Feature Column List
    '''
    if not Numeric:
        Numeric = ['int64', 'int32', 'float64', 'float32']
    Temp = Inp.dtypes.reset_index(drop = False)
    Temp.columns = ['Features', 'Data Type']
    Temp['Data Type'] = Temp['Data Type'].astype(str)
    # Numeric_Columns
    Numeric_Columns = Temp.loc[Temp['Data Type'].isin(Numeric), 'Features'].tolist()
    # Categorical_Columns
#     Categorical_Columns = Temp.loc[(~Temp['Data Type'].isin(Numeric)), 'Features'].tolist()
    Categorical_Columns = Temp.loc[Temp['Data Type'] == 'object','Features'].tolist()
    if disp_dtype:
        display(pd.DataFrame({'Numeric Columns': [', '.join(Numeric_Columns)],
                  'Categorical Columns': [', '.join(Categorical_Columns)]}, index = ['Columns']).T.style)
    # Feature Columns
    feature_columns = []
    if len(Categorical_Columns)>0:
        for feature_name in Categorical_Columns:
            vocabulary = Inp[feature_name].unique()
            feature_columns.append(tf.feature_column.indicator_column(\
                                      tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary)))
    if len(Numeric_Columns)>0:
        for feature_name in Numeric_Columns:
            feature_columns.append(tf.feature_column.numeric_column(feature_name))
    return feature_columns
  • features - A Python dictionary in which:
    • Each key is the name of a feature.
    • Each value is an array containing all of that feature's values.
  • label - An array containing the values of the label for every example.
In [6]:
def make_input_fn(X, y, inmemory_train = False, n_epochs= None, shuffle=True, batch_size = 256):
    # Not In memory Training
    if not inmemory_train:
        def input_fn():
            dataset = tf.data.Dataset.from_tensor_slices((X.to_dict(orient='list'), y))
            if shuffle:
                dataset = dataset.shuffle(1000)
            dataset = (dataset.repeat(n_epochs).batch(batch_size))
            return dataset
    # In memory Training
    if inmemory_train:
        y = np.expand_dims(y, axis=1)
        def input_fn():
            return dict(X), y
    # End
    return input_fn

Building the input pipeline

In [7]:
my_feature_columns = Feat_Columns(X)
# Training and evaluation input functions.
train_input_fn = make_input_fn(X_train, y_train)
eval_input_fn = make_input_fn(X_test, y_test, shuffle=False, n_epochs=1)
Modeling: Boosted Trees Classifier
In [8]:
from IPython.display import clear_output
from timeit import default_timer as timer

# Classifier
tf.keras.backend.clear_session()
IT = int(1e3)
params = {'n_trees': 50, 'max_depth': 3, 'n_batches_per_layer': 1, 'center_bias': True}
classifier =  tf.estimator.BoostedTreesClassifier(my_feature_columns, **params)
# Train model.
start = timer()
classifier.train(train_input_fn, max_steps = IT)
CPU_Time = timer() - start
# Evaluation.
results = classifier.evaluate(eval_input_fn)
clear_output()
results['CPU Time'] = CPU_Time
display(pd.DataFrame(results, index = ['']).round(4))
accuracy accuracy_baseline auc auc_precision_recall average_loss label/mean loss precision prediction/mean recall global_step CPU Time
0.9649 0.6257 0.9947 0.9967 0.0945 0.6257 0.0945 0.955 0.6404 0.9907 215 9.5161

ROC Curves

In [9]:
from HD_DeepLearning import ROC_Curve

# converting y_test to categorical
y_test_cat = tf.keras.utils.to_categorical(y_test, num_classes = len(Labels_dict), dtype='float32')

pred_dicts = list(classifier.predict(input_fn=eval_input_fn))
clear_output()
probs = np.array([pred['probabilities'] for pred in pred_dicts])    
ROC_Curve(y_test_cat, probs, n_classes = len(Labels_dict), FS = 8)

Confusion Matrix

The confusion matrix allows for visualization of the performance of an algorithm. Note that due to the size of data, here we don't provide a Cross-validation evaluation. In general, this type of evaluation is preferred.

In [10]:
def Confusion_Mat(CM_Train, CM_Test, PD, n_splits = 10):
    if n_splits == None:
        Titles = ['Train Set', 'Test Set']
    else:
        Titles = ['Train Set (CV = % i)' % n_splits, 'Test Set (CV = % i)' % n_splits]
    CM = [CM_Train, CM_Test]
    Cmap = ['Greens', 'YlGn','Blues', 'PuBu']
    for i in range(2):
        fig, ax = plt.subplots(1, 2, figsize= PD['FS'])
        fig.suptitle(Titles[i], weight = 'bold', fontsize = 16)
        _ = sns.heatmap(CM[i], annot=True, annot_kws={"size": PD['annot_kws']}, cmap=Cmap[2*i], ax = ax[0],
                        linewidths = 0.2, cbar_kws={"shrink": PD['shrink']})
        _ = ax[0].set_title('Confusion Matrix');
        Temp = np.round(CM[i].astype('float') / CM[i].sum(axis=1)[:, np.newaxis], 2)
        _ = sns.heatmap(Temp,
                        annot=True, annot_kws={"size": PD['annot_kws']}, cmap=Cmap[2*i+1], ax = ax[1],
                       linewidths = 0.4, vmin=0, vmax=1, cbar_kws={"shrink": PD['shrink']})
        _ = ax[1].set_title('Normalized Confusion Matrix');

        for a in ax:
            _ = a.set_xlabel('Predicted labels')
            _ = a.set_ylabel('True labels'); 
            _ = a.xaxis.set_ticklabels(PD['Labels'])
            _ = a.yaxis.set_ticklabels(PD['Labels'])
            _ = a.set_aspect(1)
Train in Memory

An alternative way to train a model with boosting performance is using the train_in_memory feature. However, if there is no issue with performance or long training time is not a concern, training without this feature is recommended [2]. Furthermore, our observations have shown that using train_in_memory not always increases the performance of the training.

In [11]:
in_memory_params = dict(params)
in_memory_params['n_batches_per_layer'] = 1
# In-memory input_fn does not use batching.
train_input_fn = make_input_fn(X_train, y_train, inmemory_train = True)

# Classifier
tf.keras.backend.clear_session()
classifier =  tf.estimator.BoostedTreesClassifier(my_feature_columns, train_in_memory=True, **in_memory_params)
# Train model.
start = timer()
classifier.train(train_input_fn, max_steps = IT)
CPU_Time = timer() - start
# Evaluation.
results = classifier.evaluate(eval_input_fn)
clear_output()
results['CPU Time'] = CPU_Time
display(pd.DataFrame(results, index = ['']).round(4))
accuracy accuracy_baseline auc auc_precision_recall average_loss label/mean loss precision prediction/mean recall global_step CPU Time
0.9649 0.6257 0.9773 0.9846 0.1447 0.6257 0.1447 0.955 0.64 0.9907 153 9.7063

ROC Curves

In [12]:
# converting y_test to categorical
y_test_cat = tf.keras.utils.to_categorical(y_test, num_classes = len(Labels_dict), dtype='float32')

pred_dicts = list(classifier.predict(input_fn=eval_input_fn))
clear_output()
probs = np.array([pred['probabilities'] for pred in pred_dicts])    
ROC_Curve(y_test_cat, probs, n_classes = len(Labels_dict), FS = 8)

Feature Importance

We can investigate the feature importance of an artificial classification task. This is similar to that of scikit-learn and has been outlined in [6].

In [13]:
pred_dicts = list(classifier.experimental_predict_with_explanations(eval_input_fn))
clear_output()

# Create DFC Pandas dataframe.
labels = y_test
probs = pd.Series([pred['probabilities'][1] for pred in pred_dicts])
df_dfc = pd.DataFrame([pred['dfc'] for pred in pred_dicts])
df_dfc.columns = [x.replace('_',' ') for x in df_dfc.columns]
display(df_dfc.describe().T.style.background_gradient(subset= ['mean'], cmap='RdYlGn')\
        .background_gradient(subset= ['std'], cmap='RdYlGn')\
        .background_gradient(subset= ['min'], cmap='hot')\
        .background_gradient(subset= ['max'], cmap='winter')
        .format(precision=4).format({'count': "{:.0f}"}))
  count mean std min 25% 50% 75% max
Worst Concave Points 171 0.034192 0.094021 -0.205263 -0.083804 0.093184 0.095550 0.209183
Worst Texture 171 0.012124 0.041562 -0.100195 -0.001904 0.016808 0.021577 0.250249
Worst Perimeter 171 0.001343 0.064912 -0.129914 -0.080307 0.047117 0.054797 0.097532
Concavity Error 171 -0.010531 0.016965 -0.067898 -0.029514 0.000000 0.000000 0.045044
Mean Concave Points 171 0.006037 0.049443 -0.228195 -0.038724 0.025016 0.027949 0.196439
Worst Smoothness 171 -0.035540 0.049082 -0.203640 -0.092039 -0.003195 0.001325 0.098780
Worst Concavity 171 -0.013156 0.022748 -0.074619 -0.038301 -0.001749 0.003068 0.049867
Mean Texture 171 -0.039711 0.066935 -0.227935 -0.121613 -0.001024 0.008020 0.185712
Worst Radius 171 0.026134 0.075031 -0.169159 -0.064681 0.078385 0.079460 0.125474
Worst Area 171 -0.004337 0.042053 -0.146367 -0.034422 0.021547 0.024880 0.088355
Mean Compactness 171 0.001090 0.008455 -0.047627 -0.000728 0.000275 0.001981 0.044819
Worst Compactness 171 0.010642 0.014845 -0.051928 -0.002935 0.020337 0.022721 0.040747
Area Error 171 0.004360 0.033624 -0.125323 -0.033358 0.025121 0.026144 0.105666
Mean Smoothness 171 -0.000436 0.009291 -0.043719 -0.001080 -0.000493 0.000214 0.053844
Worst Symmetry 171 0.004212 0.009183 -0.006953 0.000018 0.001450 0.003111 0.064156
Symmetry Error 171 -0.000077 0.002754 -0.021032 0.000000 0.000000 0.000091 0.011241
Compactness Error 171 0.002332 0.014732 -0.058353 -0.000021 0.000480 0.001409 0.082985
Mean Concavity 171 0.001567 0.007483 -0.015696 -0.000203 0.000205 0.000647 0.052443
Mean Area 171 -0.000068 0.003819 -0.016273 -0.000190 0.000080 0.000214 0.034322
Smoothness Error 171 0.006861 0.012611 -0.043151 0.000000 0.005294 0.007006 0.079252
Radius Error 171 0.000179 0.000908 -0.006015 0.000005 0.000022 0.000089 0.003331
Texture Error 171 -0.000100 0.003165 -0.037027 0.000000 0.000011 0.000039 0.006850
Concave Points Error 171 0.000579 0.003112 -0.000841 0.000000 0.000000 0.000025 0.025199
Fractal Dimension Error 171 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Mean Fractal Dimension 171 0.000732 0.005554 -0.010563 0.000000 0.000000 0.000000 0.046179
Mean Perimeter 171 0.000430 0.004037 -0.024298 -0.000008 0.000000 0.000624 0.022747
Mean Radius 171 -0.000101 0.001316 -0.017205 0.000000 0.000000 0.000000 0.000000
Mean Symmetry 171 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Perimeter Error 171 0.004431 0.007246 -0.045032 0.000000 0.005046 0.007010 0.038531
Worst Fractal Dimension 171 -0.001333 0.011864 -0.060120 -0.000699 -0.000028 0.000131 0.063465

A nice property of DFCs is that the sum of the contributions + the bias is equal to the prediction for a given example.

In [14]:
# Sum of DFCs + bias == probabality.
bias = pred_dicts[0]['bias']
dfc_prob = df_dfc.sum(axis=1) + bias
np.testing.assert_almost_equal(dfc_prob.values, probs.values)

Feature Importance for a patient

Plot DFCs for an individual patient which is color-coded based on the contributions' directionality and add the feature values on the figure.

In [15]:
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
def _add_feature_values(feature_values, ax, colors):
    """Display feature's values on left of plot."""
    x_coord = ax.get_xlim()[0]
    OFFSET = 0.15
    for y_coord, (feat_name, feat_val) in enumerate(feature_values.items()):
        t = plt.text(x_coord, y_coord - OFFSET, '{}'.format(feat_val), size=12)
        t.set_bbox(dict(facecolor= colors[y_coord], alpha=0.25))
    font = FontProperties()
#     font.set_weight('bold')
    t = plt.text(x_coord, y_coord + 1 - OFFSET, 'Feature\nValue', fontproperties=font, size=13)

def _xLims(ax):
    Temp = np.linspace(-1,1,21, endpoint=True)
    Temp = np.round(Temp,1)
    xlims = ax.get_xlim()
    for l, r in list(zip(Temp[:-1],Temp[1:])):
        if l<= xlims[0] < r:
            Left = l
        if l<= xlims[1] < r:
            Right = r
    return [Left, Right]

def Plot_Example(example, TOP_N = 10, Pos_Color = 'LimeGreen', Neg_Color = 'OrangeRed', Maps = None, FS = (13, 7)):
    # Sorting by absolute value
    sorted_ix = example.abs().sort_values()[-TOP_N:].index
    example = example[sorted_ix]
    
    fig, ax = plt.subplots(1, 1, figsize= FS)
    Temp = example.to_frame('Value').sort_index(ascending= False)
    Temp0 = Temp.copy(); Temp0[Temp0 < 0] = np.nan
    _ = Temp0.plot(kind='barh', color= Pos_Color, edgecolor = 'white', hatch = '///', legend=None, alpha=0.75, ax = ax)
    Temp0 = Temp.copy(); Temp0[Temp0 >= 0] = np.nan
    _ = Temp0.plot(kind='barh', color= Neg_Color, edgecolor = 'white', hatch = '///', legend=None, alpha=0.75, ax = ax)
    _ = Temp.plot(kind='barh', color='None', edgecolor = 'Black', legend=None, alpha=1, lw=1.2, ax = ax)
    del Temp, Temp0
    _ = ax.grid(False, axis='y')
    # x axis Limits
    _ = ax.set_xlim(_xLims(ax))
    
    # Add feature values.
    Temp = X_test.copy()
    Temp.columns = [x.replace('_',' ') for x in Temp.columns]
    if not Maps == None:
        for c in Maps.keys():
            Temp[c] = Temp[c].map(Maps[c])
  
    colors = example.map(lambda x: Pos_Color if x >= 0 else Neg_Color).tolist()
    _add_feature_values(Temp.iloc[ID][sorted_ix].round(4), ax, colors)
    return ax
In [16]:
ID = 61
Tops = X_train.shape[1]
ax = Plot_Example(df_dfc.iloc[ID], TOP_N = Tops, FS = (13, 16))
_ = ax.set_title('Feature contributions for example patient {} from the Test set\n Pred: {:1.2f}; Label: {}'
                 .format(ID, probs[ID], labels.iloc[ID]))
_ = ax.set_xlabel('Contribution to Predicted Probability', size=14)

References

  1. Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J.J., Sandhu, S., Guppy, K.H., Lee, S. and Froelicher, V., 1989. International application of a new probability algorithm for the diagnosis of coronary artery disease. The American journal of cardiology, 64(5), pp.304-310.

  2. Aha, D. and Kibler, D., 1988. Instance-based prediction of heart-disease presence with the Cleveland database. University of California, 3(1), pp.3-2.

  3. Gennari, J.H., Langley, P. and Fisher, D., 1989. Models of incremental concept formation. Artificial intelligence, 40(1-3), pp.11-61.

  4. Regression analysis Wikipedia page
  5. Tensorflow tutorials
  6. TensorFlow Boosted Trees Classifier
  7. Lasso (statistics) Wikipedia page)
  8. Tikhonov regularizationm Wikipedia page
  9. Palczewska A., Palczewski J., Marchese Robinson R., Neagu D. (2014) Interpreting Random Forest Classification Models Using a Feature Contribution Method. In: Bouabana-Tebibel T., Rubin S. (eds) Integration of Reusable Systems. Advances in Intelligent Systems and Computing, vol 263. Springer, Cham
  10. S. Aeberhard, D. Coomans and O. de Vel, Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland. (Also submitted to Technometrics).
  11. S. Aeberhard, D. Coomans and O. de Vel, “THE CLASSIFICATION PERFORMANCE OF RDA” Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland. (Also submitted to Journal of Chemometrics).